66 research outputs found
How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning
Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy,
Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)
Aprendizaje automático en conjuntos de clasificadores heterogéneos y modelado de agentes
Una de las áreas que mas auge ha tenido en los últimos años dentro del aprendizaje
automático es aquella en donde se combinan las decisiones de clasificadores
individuales con la finalidad de que la decisión final de a que clase pertenece un
ejemplo sea realizada por un conjunto de clasificadores. Existen diversas técnicas
para generar conjuntos de clasificadores, desde la manipulación de los datos de
entrada a la utilización de meta-aprendizaje. Una de las maneras en las que se clasifican
estas técnicas es por el numero de algoritmos de aprendizaje diferentes que
utilizan con el fin de generar los miembros del conjunto. Aquellas técnicas que
utilizan un único algoritmo para generar todos los miembros del conjunto se dice
que generan un conjunto homogéneo. Por otra parte, aquellas técnicas que utilizan
mas de un algoritmo para generar los clasificadores se considera que generan
un conjunto de clasificadores heterogéneo. Entre los algoritmos de generación de
conjuntos heterogéneos se encuentra Stacking, el cual, además de generar los clasificadores
del conjunto a partir de distintos algoritmos de aprendizaje, utiliza dos
niveles de aprendizaje. El primer nivel de aprendizaje o nivel-0 utiliza los datos
del dominio de manera directa, mientras que el meta-nivel o nivel-1 utiliza datos
generados a partir de los clasificadores del nivel-0.
Un problema inherente a Stacking es determinar la configuración de los parámetros
de aprendizaje del algoritmo, entre ellos, qué y cuántos algoritmos deben ser
utilizados en la generación de los clasificadores del conjunto. Trabajo previos han
determinado que no hay un numero exacto de algoritmos a utilizar que sea el optimo
para todos los dominios. Tampoco está perfectamente definido qué algoritmos
se deberÃan utilizar, aunque existen trabajos que utilizan algoritmos representativos
de cada tipo.
Uno de los objetivos de esta tesis doctoral es la utilización de algoritmos genéticos
como técnica de optimización para determinar los algoritmos que deben ser
utilizados para generar el conjunto de clasificadores, al igual que la configuración
de los parámetros de aprendizaje de estos. De esta manera el método que se propone
es independiente del dominio, mientras que la configuración de los parámetros
de Stacking encontrada, dependerÃa del dominio.
El crecimiento del comercio electrónico y las aplicaciones en la World-Wide-
Web ha motivado el incremento de los entornos en donde intervienen agentes. Estos
entornos incluyen situaciones competitivas y/o colaborativas en donde el conocimiento
que se posea sobre los individuos involucrados en el entorno, proporciona
II
III
una clara ventaja a la hora de tomar una decisión sobre qué acción llevar a cabo.
Existen diversas formas de adquirir este conocimiento. Una de ellas es a través del
modelado del comportamiento de los agentes.
A su vez, existen diversas formas de construir el modelo de un agente. Algunas
técnicas utilizan modelos previamente construidos y su objetivo es intentar emparejar
el comportamiento observado con un modelo existente. Otras técnicas asumen
un comportamiento optimo del agente a modelar con el fin de crear un modelo de
su comportamiento.
Un segundo objetivo de esta tesis doctoral es la creación de un marco general
para el modelado de agentes basándose en la observación del comportamiento del
agente a modelar. Para ello se propone la utilización de técnicas de aprendizaje
automático con el propósito de llevar a cabo la tarea de modelado basándose en la
relación existente entre la entrada y la salida del agente.____________________________________________
In the last years, one of the most active research areas in Machine Learning
is that of ensembles of classifiers. Their purpose is to combine the decisions of
individual classifiers so that all classifiers in the ensemble are taken into account
in order to classify new instances. There are many techniques that generate such
ensembles. Some manipulate the input data, while others use meta-learning. In general,
ensembles can be homogeneous or heterogeneous. Homogeneous ensembles
consist of several classifiers generated by the same learning technique, whereas
heterogeneous ensembles contain classifiers generated by different algorithms. A
well-known approach to generate heterogeneous ensembles is Stacking. Stacking
uses two levels of learning. The first learning level or level-0 uses direct data from
the domain, whereas the meta-level or level-1 uses data generated by classifiers
from level-0.
An inherent problem to Stacking is to determine the right configuration of the
learning parameters, like how many classifiers, and which learning algorithms,
must be used in the generation of the ensemble of classifiers. Previous work have
shown that there is no optimal decision for all the domains, although there are
works that use representative algorithms from each type.
One goal of this thesis is to use Genetic Algorithms as an optimization technique
in order to determine the type and number of algorithms to be used to generate
the ensemble of classifiers, as well as the configuration of the learning parameters
of these algorithms. The proposed method is domain independent, and the Genetic
Algorithm will be able to adapt to particular domains.
The growth of the e-commerce and applications over the World-Wide-Web
has motivated the increase of environments where agents can interact. These environment
include competitive and/or colaborative situations where the knowledge
about other individuals involved in the environment, provides a clear advantage
when making decision about actions to perform. There are several ways to acquire
this knowledge. One of them is by modeling the behavior of other agents.
There are several ways to construct an agent’s model. Some techniques use
previously constructed models and its goal to match the observed behavior with
an existing model. Other techniques assume that the agent to model carries out an
optimal strategy in order to create a model of its behavior.
In this thesis, a second approach to model agents will be used based on the
observation of other agents behavior. In order to do this, a general framework that
uses machine learning techniques for agent modeling is proposed
Iot application for energy poverty detection based on thermal comfort monitoring
The development of a datalogger for identifying Energy Poverty (EP) using thermal comfort monitoring is described in this work. There is not a uniform definition of EP, and no global recommendations indicating the thermal comfort characteristics that should be utilized to identify EP. Most Internet of Things (IoT)-based systems designed for EP identification measure energy consumptions (electricity and gas). There is a lack of works that use IoT-based systems to identify EP through the monitoring of thermal comfort parameters. To address the deficiencies discovered in the identification of EP from the perspective of thermal efficiency, an IoT-based monitoring system was designed, developed, and tested. A first pilot was installed in a household in Getafe. A full month of temperature, relative humidity, and CO2 concentration measurements were utilized to evaluate the system, which was then compared to a commercial system. The results revealed that the new IoT-based approach was very dependable and may be used to accurately monitor EP-related parameters.This work was supported by the European Commission through Urban Innovative Actions of the EPIU Getafe Project under Grant UIA04-212. The work of Dr. Agapito Ledezma was supported by the Agencia Estatal de Investigación (AEI) under Grant PID2021-124335OB-C22
From continous behaviour to discrete knowledge
Proceeding of: 7th InternationalWork-Conference on Artificial and Natural Neural Networks, IWANN 2003, Maó, Menorca, Spain, June 3-6, 2003, Proceedings, Part IINeural networks have proven to be very powerful techniques for solving a wide range of tasks. However, the learned concepts are unreadable for humans. Some works try to obtain symbolic models from the networks, once these networks have been trained, allowing to understand the model by means of decision trees or rules that are closer to human understanding. The main problem of this approach is that neural networks output a continuous range of values, so even though a symbolic technique could be used to work with continuous classes, this output would still be hard to understand for humans. In this work, we present a system that is able to model a neural network behaviour by discretizing its outputs with a vector quantization approach, allowing to apply the symbolic method
Heuristic search-based stacking of classifiers
Currently, the combination of several classifiers is one of the most activefields within inductive learning. Examples of such techniques are boost-ing, bagging and stacking. From these three techniques, stacking isperhaps the least used one. One of the main reasons for this relates to thedifficulty to define and parameterize its components: selecting whichcombination of base classifiers to use, and which classifiers to use as themeta-classifier. The approach we present in this chapter poses thisproblem as an optimization task, and then uses optimization techniquesbased on heuristic search to solve it. In particular, we apply geneticalgorithms to automatically obtain the ideal combination of learningmethods for the stacking system
On the practical nature of artificial qualia
Proceeding of: 2010 Annual Convention of the Society for the Study of Artificial Intelligence and Simulation of Behaviour (AISB 2010), Leicester, UK, 29 March - 1 April, 2010.Can machines ever have qualia? Can we build robots with inner worlds of subjective experience? Will qualia experienced by robots be comparable to subjective human experience? Is the young field of Machine Consciousness (MC) ready to answer these questions? In this paper, rather than trying to answer these questions directly, we argue that a formal definition, or at least a functional characterization, of artificial qualia is required in order to establish valid engineering principles for synthetic phenomenology (SP). Understanding what might be the differences, if any, between natural and artificial qualia is one of the first questions to be answered. Furthermore, if an interim and less ambitious definition of artificial qualia can be outlined, the corresponding model can be implemented and used to shed some light on the very nature of consciousness.1In this work we explore current trends in MC and SP from the perspective of artificial qualia, attempting to identify key features that could contribute to a practical characterization of this concept. We focus specifically on potential implementations of artificial qualia as a means to provide a new interdisciplinary tool for research on natural and artificial cognition.This work was supported in part by the Spanish Ministry of Education under CICYT grant TRA2007-67374-C02-02.Publicad
Criteria for consciousness in artificial intelligent agents
Proceeding of: Adaptive Learning Agents and Multi-Agent Systems, ALAMAS+ALAg 2008 – Workshop at AAMAS 2008, Estoril, May, 12, 2008, Portugal.Accurately testing for consciousness is still an unsolved problem when applied to humans and other mammals. The inherent subjective nature of conscious experience makes it virtually unreachable to classic empirical approaches. Therefore, alternative strategies based on behavior analysis and neurobiological studies are being developed in order to determine the level of consciousness of biological organisms. However, these methods cannot be directly applied to artificial systems. In this paper we propose both a taxonomy and some functional criteria that can be used to assess the level of consciousness of an artificial intelligent agent. Furthermore, a list of measurable levels of artificial consciousness, ConsScale, is defined as a tool to determine the potential level of consciousness of an agent. Both the mapping of consciousness to AI and the role of consciousness in cognition are controversial and unsolved questions, in this paper we aim to approach these issues with the notions of I-Consciousness and embodied intelligence.This research has been supported by the Spanish Ministry of Education and Science under project TRA2007-67374-C02-02.Publicad
Strategies for measuring machine consciousness
The accurate measurement of the level of consciousness of a creature remains a major scientific challenge, nevertheless a number of new accounts that attempt to address this problem have been proposed recently. In this paper we analyze the principles of these new measures of consciousness along with other classical approaches focusing on their applicability to Machine Consciousness (MC). Furthermore, we propose a set of requirements of what we think a suitable measure for MC should be, discussing the associated theoretical and practical issues. Using the proposed requirements as a framework for the design of an integrative measure of consciousness, we explore the possibility of designing such a measure in the context of current state of the art in consciousness studies.This work has been supported by the Grant CICYTTRA-2007-67374-C02-02
Towards the generation of visual qualia in artificial cognitive architectures
Proceeding of: Brain Inspired Cognitive Systems (BICS 2010). Madrid, Spain, 14-16 July, 2010.The nature and the generation of qualia in machines is a highly controversial issue. Even the existence of such a concept in the realm of artificial systems is often neglected or denied. In this work, we adopt a pragmatic approach to this problem using the Synthetic Phenomenology perspective. Specifically, we explore the generation of visual qualia in an artificial cognitive architecture inspired on the Global Workspace Theory (GWT). We argue that preliminary results obtained as part of this research line will help to characterize and identify artificial qualia as the direct products of conscious perception in machines. Additionally, we provide a computational model for integrated covert and overt perception in the framework of the GWT. A simple form of the apparent motion effect is used as a preliminary experimental context and a practical case study for the generation of synthetic visual experience. Thanks to an internal inspection subsystem, we are able to analyze both covert and overt percepts generated by our system when confronted with visual stimuli. The inspection of the internal states generated within the cognitive architecture enable us to discuss possible analogies with human cognition processes.This work was supported in part by the Spanish Ministry of Education under CICYT grant TRA2007-67374-C02-02.Publicad
ConsScale: a plausible test for machine consciousness?
Proceeding of: the Nokia Workshop on Machine Consciousness, (in 13th Finnish Artificial Intelligence Conference, STeP 2008), Helsinki, Finland, August 21-22, 2008.Is consciousness a binary on/off property? Or is it on the contrary a complex phenomenon that can be present in different states, qualities, and degrees? We support the latter and propose a linear incremental scale for consciousness applicable to artificial agents. ConsScale is a novel agent taxonomy intended to classify agents according to their level of consciousness. Even though testing for consciousness remains an open question in the domain of biological organisms, a review of current biological approaches is discussed as well as their possible adapted application into the realm of artificial agents. Regarding to the always controversial problem of phenomenology, in this work we have adopted a purely functional approach, in which we have defined a set of architectural and behavioral criteria for each level of consciousness. Thanks to this functional definition of the levels, we aim to specify a set of tests that can be used to unambiguously determine the higher level of consciousness present in the artificial agent under study. Additionally, since a number of objections can be presumably posed against our proposal, we have considered the most obvious critiques and tried to offer reasonable rebuttals to them. Having neglected the phenomenological dimension of consciousness, our proposal might be considered reductionist and incomplete. However, we believe our account provides a valuable tool for assessing the level of consciousness of an agent at least from a cognitive point of view.This research has been also supported by the Spanish Ministry of Education and Science under CICYT grant TRA2007-67374-C02-02.Publicad
- …